Statistical trajectory models for phonetic recognition

نویسندگان

  • William Goldenthal
  • James R. Glass
چکیده

The main goal of this work is to develop an alternative methodology for acoustic{ phonetic modelling of speech sounds. The approach utilizes a segment{based framework to capture the dynamical behavior and statistical dependencies of the acoustic attributes used to represent the speech waveform. Temporal behavior is modelled explicitly by creating dynamic tracks of the acoustic attributes used to represent the waveform, and by estimating the spatio{temporal correlation structure of the resulting errors. The tracks serve as templates from which synthetic segments of the acoustic attributes are generated. Scoring of an hypothesized phonetic segment is then based on the error between the measured acoustic attributes and the synthetic segments generated for each phonetic model. Phonetic contextual in uences are accounted for in two ways. First, context{ dependent biphone tracks are created for each phonetic model. These tracks are then merged as needed to generate triphone tracks. The error statistics are pooled over all the contexts for each phonetic model. This allows for the creation of a large number of contextual models (e.g., 2,500) without compromising the robustness of the statistical parameter estimates. The resulting triphone coverage is over 99.5%. The second method of accounting for context involves creating tracks of the transitions between phones. By clustering these tracks, complete models are constructed of over 200 \canonical" transitions. The transition models help in two ways. First, the transition scores are incorporated into the scoring framework to help determine the phonetic identity of the two phones involved. Secondly, they are used to determine likely segment boundaries within an utterance. This reduces the search space during phonetic recognition. Phonetic classi cation experiments are performed which demonstrate the importance of the temporal correlation information in the speech signal. A complete phonetic recognition system, incorporating all the di erent model elements, is described. Both context{independent and context{dependent recognition experiments are performed using the timit acoustic{phonetic corpus. The measured phonetic accuracy is virtually identical to the best reported result achieved with hidden Markov models, the most successful speech recognizers developed to this date. Thesis Supervisor: Dr. James Glass Title: Research Scientist, Laboratory for Computer Science `Twas brillig, and the slithy toves Did gyre and gimble in the wabe. { Lewis Carroll

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Whither Linguistic Interpretation of Acoustic Pronunciation Variation

Recent research suggests that modelling pronunciation variation is more appropriate at the syllable level than at the level of contextdependent phones. Due to the large number of factors affecting syllable pronunciation, the creation of multi-path topologies is nec­ essary. Previous research on multi-path models in connected digit recognition has proved trajectory clustering to be an attractive...

متن کامل

Phonetic speaker recognition using maximum-likelihood binary-decision tree models

Recent work in phonetic speaker recognition has shown that modeling phone sequences using n-grams is a viable and effective approach to speaker recognition, primarily aiming at capturing speaker-dependent pronunciation and also word usage. This paper describes a method involving binary-tree-structured statistical models for extending the phonetic context beyond that of standard n-grams (particu...

متن کامل

Modeling trajectories in the HMM framework

Most state-of-the-art statistical speech recognition systems use hidden Markov models (HMM) for modeling the speech signal. However, limited by the assumption of conditional independence of observations given the state sequence, current HMM's poorly model the trajectory constraints in speech. In [1], we introduced the parallel path HMM, where each phonetic unit is represented by a parallel coll...

متن کامل

A dynamic, feature-based approach to the interface between phonology and phonetics for speech modeling and recognition

An overview of a statistical paradigm for speech recognition is given where phonetic and phonological knowledge sources, drawn from the current understanding of the global characteristics of human speech communication, are seamlessly integrated into the structure of a stochastic model of speech. A consistent statistical formalism is presented in which the submodels for the discrete, feature-bas...

متن کامل

Speech Recognition using Acoustic Landmarks and Binary Phonetic Feature Classifiers

In spite of decades of research, Automatic Speech Recognition (ASR) is far from reaching the goal of performance close to Human Speech Recognition (HSR). One of the reasons for unsatisfactory performance of the state-of-the-art ASR systems, that are based largely on Hidden Markov Models (HMMs), is the inferior acoustic modeling of low level or phonetic level linguistic information in the speech...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1994